Goto

Collaborating Authors

 reference ligand




PhenoMoler: Phenotype-Guided Molecular Optimization via Chemistry Large Language Model

Song, Ran, Liu, Hui

arXiv.org Artificial Intelligence

Current molecular generative models primarily focus on improving drug-target binding affinity and specificity, often neglecting the system-level phenotypic effects elicited by compounds. Transcriptional profiles, as molecule-level readouts of drug-induced phenotypic shifts, offer a powerful opportunity to guide molecular design in a phenotype-aware manner. We present PhenoMoler, a phenotype-guided molecular generation framework that integrates a chemistry large language model with expression profiles to enable biologically informed drug design. By conditioning the generation on drug-induced differential expression signatures, PhenoMoler explicitly links transcriptional responses to chemical structure. By selectively masking and reconstructing specific substructures-scaffolds, side chains, or linkers-PhenoMoler supports fine-grained, controllable molecular optimization. Extensive experiments demonstrate that PhenoMoler generates chemically valid, novel, and diverse molecules aligned with desired phenotypic profiles. Compared to FDA-approved drugs, the generated compounds exhibit comparable or enhanced drug-likeness (QED), optimized physicochemical properties, and superior binding affinity to key cancer targets. These findings highlight PhenoMoler's potential for phenotype-guided and structure-controllable molecular optimization.


Flow-Based Fragment Identification via Binding Site-Specific Latent Representations

Neeser, Rebecca Manuela, Igashov, Ilia, Schneuing, Arne, Bronstein, Michael, Schwaller, Philippe, Correia, Bruno

arXiv.org Artificial Intelligence

Fragment-based drug design is a promising strategy leveraging the binding of small chemical moieties that can efficiently guide drug discovery. The initial step of fragment identification remains challenging, as fragments often bind weakly and non-specifically. We developed a protein-fragment encoder that relies on a contrastive learning approach to map both molecular fragments and protein surfaces in a shared latent space. The encoder captures interaction-relevant features and allows to perform virtual screening as well as generative design with our new method LatentFrag. In LatentFrag, fragment embeddings and positions are generated conditioned on the protein surface while being chemically realistic by construction. Our expressive fragment and protein representations allow location of protein-fragment interaction sites with high sensitivity and we observe state-of-the-art fragment recovery rates when sampling from the learned distribution of latent fragment embeddings. Our generative method outperforms common methods such as virtual screening at a fraction of its computational cost providing a valuable starting point for fragment hit discovery. We further show the practical utility of LatentFrag and extend the workflow to full ligand design tasks. Together, these approaches contribute to advancing fragment identification and provide valuable tools for fragment-based drug discovery.


Unified Guidance for Geometry-Conditioned Molecular Generation

Ayadi, Sirine, Hetzel, Leon, Sommer, Johanna, Theis, Fabian, Günnemann, Stephan

arXiv.org Artificial Intelligence

Effectively designing molecular geometries is essential to advancing pharmaceutical innovations, a domain, which has experienced great attention through the success of generative models and, in particular, diffusion models. However, current molecular diffusion models are tailored towards a specific downstream task and lack adaptability. We introduce UniGuide, a framework for controlled geometric guidance of unconditional diffusion models that allows flexible conditioning during inference without the requirement of extra training or networks. We show how applications such as structure-based, fragment-based, and ligand-based drug design are formulated in the UniGuide framework and demonstrate on-par or superior performance compared to specialised models. Offering a more versatile approach, UniGuide has the potential to streamline the development of molecular generative models, allowing them to be readily used in diverse application scenarios.


SynthFormer: Equivariant Pharmacophore-based Generation of Molecules for Ligand-Based Drug Design

Jocys, Zygimantas, Willems, Henriette M. G., Farrahi, Katayoun

arXiv.org Artificial Intelligence

Drug discovery is a complex and resource-intensive process, with significant time and cost investments required to bring new medicines to patients. Recent advancements in generative machine learning (ML) methods offer promising avenues to accelerate early-stage drug discovery by efficiently exploring chemical space. This paper addresses the gap between in silico generative approaches and practical in vitro methodologies, highlighting the need for their integration to optimize molecule discovery. We introduce SynthFormer, a novel ML model that utilizes a 3D equivariant encoder for pharmacophores to generate fully synthesizable molecules, constructed as synthetic trees. Unlike previous methods, SynthFormer incorporates 3D information and provides synthetic paths, enhancing its ability to produce molecules with good docking scores across various proteins. Our contributions include a new methodology for efficient chemical space exploration using 3D information, a novel architecture called Synthformer for translating 3D pharmacophore representations into molecules, and a meaningful embedding space that organizes reagents for drug discovery optimization. Synthformer generates molecules that dock well and enables effective late-stage optimization restricted by synthesis paths.


From Theory to Therapy: Reframing SBDD Model Evaluation via Practical Metrics

Gao, Bowen, Tan, Haichuan, Huang, Yanwen, Ren, Minsi, Huang, Xiao, Ma, Wei-Ying, Zhang, Ya-Qin, Lan, Yanyan

arXiv.org Artificial Intelligence

Recent advancements in structure-based drug design (SBDD) have significantly enhanced the efficiency and precision of drug discovery by generating molecules tailored to bind specific protein pockets. Despite these technological strides, their practical application in real-world drug development remains challenging due to the complexities of synthesizing and testing these molecules. The reliability of the Vina docking score, the current standard for assessing binding abilities, is increasingly questioned due to its susceptibility to overfitting. To address these limitations, we propose a comprehensive evaluation framework that includes assessing the similarity of generated molecules to known active compounds, introducing a virtual screening-based metric for practical deployment capabilities, and re-evaluating binding affinity more rigorously. Our experiments reveal that while current SBDD models achieve high Vina scores, they fall short in practical usability metrics, highlighting a significant gap between theoretical predictions and real-world applicability. Our proposed metrics and dataset aim to bridge this gap, enhancing the practical applicability of future SBDD models and aligning them more closely with the needs of pharmaceutical research and development.


Rethinking Specificity in SBDD: Leveraging Delta Score and Energy-Guided Diffusion

Gao, Bowen, Ren, Minsi, Ni, Yuyan, Huang, Yanwen, Qiang, Bo, Ma, Zhi-Ming, Ma, Wei-Ying, Lan, Yanyan

arXiv.org Artificial Intelligence

In the field of Structure-based Drug Design (SBDD), deep learning-based generative models have achieved outstanding performance in terms of docking score. However, further study shows that the existing molecular generative methods and docking scores both have lacked consideration in terms of specificity, which means that generated molecules bind to almost every protein pocket with high affinity. To address this, we introduce the Delta Score, a new metric for evaluating the specificity of molecular binding. To further incorporate this insight for generation, we develop an innovative energy-guided approach using contrastive learning, with active compounds as decoys, to direct generative models toward creating molecules with high specificity. Our empirical results show that this method not only enhances the delta score but also maintains or improves traditional docking scores, successfully bridging the gap between SBDD and real-world needs.